Automatically Tuned Dense Linear Algebra for Multicore+GPU

نویسندگان

  • Xing Fu
  • Xue Li
  • Gregory D. Peterson
چکیده

The Multicore+GPU architecture has been adopted in some of the fastest supercomputers listed on the TOP500. The MAGMA project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures processors like Multicore+GPU. However, to provide portable performance, manual parameter tuning is required. This paper presents automatically tuned LU factorization. The key parameter of LU factorization is tuned automatically to optimize performance for a particular GPU platform. Moreover, we propose a work stealing scheme and GREENsynchronization to decrease the power consumption of the LU factorization and accelerate the entire application.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards dense linear algebra for hybrid GPU accelerated manycore systems

0167-8191/$ see front matter 2010 Elsevier B.V doi:10.1016/j.parco.2009.12.005 * Corresponding author. Tel.: +1 865 974 8295; fa E-mail addresses: [email protected] (S. Tomov We highlight the trends leading to the increased appeal of using hybrid multicore + GPU systems for high performance computing. We present a set of techniques that can be used to develop efficient dense linear algebra alg...

متن کامل

One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators

One-sided dense matrix factorizations are important computational kernels in many scientific and engineering simulations. In this paper, we propose two extensions of both right-looking (LU and QR) and left-looking (Cholesky) one-sided factorization algorithms to utilize the computing power of current heterogeneous architectures. We first describe a new class of non-GPU-resident algorithms that ...

متن کامل

One-sided dense matrix factorizations on a multicore with multiple GPU accelerators in MAGMA1

One-sided dense matrix factorizations are important computational kernels in many scientific and engineering simulations. In this paper, we propose two extensions of both right-looking (LU and QR) and left-looking (Cholesky) factorization algorithms to utilize the computing power of current heterogeneous architectures. We first describe a new class of non-GPU-resident algorithms that factorize ...

متن کامل

OpenCL Evaluation for Numerical Linear Algebra Library Development

With the help of of CUDA [7], [6], many applications improved their performance by using GPUs. In our project called Matrix Algebra on GPU and Multicore Architectures (MAGMA) [10], we mainly focus on dense linear algebra routines similar to those from LAPACK [1]. Other than CUDA, there exist other frameworks that allow platformindependent programming for GPUs. The main three frameworks are: 1) ...

متن کامل

A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems

Aiming to fully exploit the computing power of all CPUs and all GPUs on hybrid CPU-GPU systems to solve dense linear algebra problems, we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, as well as to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010